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Introduction 


In August 2010, the Smithsonian Science Education Center (SSEC), a division of the Smithsonian 
Institution formerly known as the National Science Resources Center (NSRC), received a grant of more 
than $25 million from the U.S. Department of Education’s Investing in Innovation (i3) program for a five- 
year study to validate its Leadership and Assistance for Science Education Reform (LASER) model in three 
regions of the United States: rural North Carolina, northern New Mexico, and the Houston Independent 
School District (HISD). Matching funds to support the study in the amount of more than $5 million were 
obtained from partners in the three regions as required by the Department of Education. 


The independent third-party research evaluation of the LASER model was conducted by the Center for 
Research in Educational Policy (CREP) with technical assistance from Westat and Abt Associates, who 
were provided to i3 grantees’ evaluation partners by the US Department of Education (USDOE). CREP, a 
Tennessee Center of Excellence, is a research and evaluation unit based at the College of Education at the 
University of Memphis. 


Interim evaluation results were reported to SSEC by CREP in annual formal technical reports as well as 
more informally through presentations and written materials. In July 2015, a comprehensive report (The 
LASER Model: A Systemic and Sustainable Approach for Achieving High Standards in Science Education 
Summative Report) was submitted to SSEC containing overall findings, conclusions, and 
recommendations in summary form based on analysis of the final year of available quantitative and 
qualitative data for students in two cohorts of schools: an elementary cohort and a middle school cohort. 
Supporting materials comprising the complete final report included an overview of implementation 
findings related to the five pillars of the LASER model and a report of findings from case studies in 
addition to quantitative analyses of achievement data related to both confirmatory and exploratory 
research questions. 


The current report focuses on the confirmatory and exploratory research questions submitted to i3 for 
the two studies conducted for the LASER i3 validation grant, providing clarifying detail related to 
methodology and instrumentation. The studies were conducted to answer two confirmatory research 
questions: 


After three years of participation in the study (i.e., after Year 3), do schools containing the Grade 3 
elementary school cohort that receive the LASER intervention (i.e., Phase 1 schools) attain higher levels of 
science achievement than schools that do not receive this intervention (i.e., Phase 2 schools) as measured 
by the PASS? 


After three years of participation in the study (i.e., after Year 3), do schools containing the Grade 6 middle 
school cohort that receive the LASER intervention (i.e., Phase 1 schools) attain higher levels of science 
achievement than schools that do not receive this intervention (i.e., Phase 2 schools) as measured by the 
PASS? 
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In addition, the studies were conducted to answer two exploratory research questions: 


After three years of participation in the study (i.e., after Year 3), do schools containing the Grade 3 
elementary school cohort that receive the LASER intervention (i.e., Phase 1 schools) attain higher levels of 
science achievement than schools that do not receive this intervention (i.e., Phase 2 schools) as measured 
by the PASS for the following underrepresented students in STEM? 


a. Students with Disabilities 

b. English Language Learners 
c. Economically Disadvantaged 
d. Females 


After three years of participation in the study (i.e., after Year 3), do schools containing the Grade 6 middle 
school cohort that receive the LASER intervention (i.e., Phase 1 schools) attain higher levels of science 
achievement than schools that do not receive this intervention (i.e., Phase 2 schools) as measured by the 
PASS for the following underrepresented students in STEM? 


a. Students with Disabilities 
b. English Language Learners 
c. Economically Disadvantaged 


d. Females 


Two new versions of the What Work Clearinghouse Procedures and Standards Handbook (Version 2.1 
and Version 3.0) were published after the research design of the study was approved under i3 and data 
collection had begun, in addition to the Reviewer Guidance for Use with the Procedures and Standards 
Handbook Version 3.0, which was published in March 2016, subsequent to completion of this summative 
report submitted to SSEC in July 2015. The current report contains language intended to clarify findings 
and ensure alignment with the most recent version of the handbook (Version 3.0) and the Reviewer 
Guidance document. For example, issues such as level of inference (individual or cluster) and joiners vs. 
stayers were not part of the WWC Handbook at the time this study was first implemented under i3. Asa 
result, considerations such as these were not factored into the original presentation of results. The 
current version of the report reflects language that the WWC would evaluate as part of their review of 
the study findings under the most current version of the handbook. 


Note: As previously indicated, data for the elementary and middle school samples reported in this 
manuscript should be treated as two separate studies for reporting and interpretation purposes--an 
elementary school study and a middle school study (as confirmed by the WWC in the letter in Figure A-1), 
both with cluster-level research questions and inference, but with analysis at the individual (i.e., student) 
level. Subgroup analyses focused on under-represented groups in STEM: Students with a Disability (IEP), 
English Language Learners (ELL), Economically Disadvantaged (FRL), and females. 
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Methodology 


The LASER i3 Validation study utilized a matched-pair, randomized controlled trial (RCT) and was 
designed to meet the What Works Clearinghouse (WWC) criteria “without reservations,” which is the 
highest possible rating. Schools with intact elementary (grades 3-5) and middle school (grades 6-8) 
cohorts were paired and randomly assigned to Phase 1 (immediate implementation) or Phase 2 (delayed 
implementation). Schools in Phase 1 began implementing LASER in the fall of 2011; Phase 2 schools 
served as the control group, receiving a reduced version of LASER following the conclusion of the 
research study. The matched-pair design was utilized to ensure equivalency between groups. 


Baseline equivalence was established with the analytic sample for elementary schools overall and for all 
subgroups and in all but two cases for middle schools (PASS multiple choice and open-ended for the ELL 
subgroup, both of which favored Phase 1 schools). For the middle school study, all sixth graders in the 
impact analyses attended elementary schools the year prior to the inception of this study and were 
therefore not in their middle schools at the time of random assignment of clusters. By WWC definition, 
they were considered joiners (students not enrolled in the school at the time of random assignment). 
There were also joiners included in the elementary study (students whose first year in the study 
school/cluster was the third grade), although this number was relatively small (17.9% of the Phase 1 and 
9.9% of the Phase 2 sample, 14.5% of the total sample). Please also note that while the word "student" is 
used throughout this manuscript in reporting some incidental findings (e.g., percentile ranks) due to the 
individual level analyses, and while tables of sample sizes and outcomes reference samples at both the 
school and individual levels due to the individual level analyses, inferences for both the elementary and 
middle school results should be at the cluster (i.e., school) level as whole schools, not individual students, 
received the intervention. 


Overall and differential cluster-level attrition levels were calculated for the full elementary and middle 
school samples as well as for each subgroup for comparison to the What Works Clearinghouse (WWC) 
cluster-level attrition standards (See Table A-2,Table A-3, and Table A-4 in the Appendix), and analyses of 
aggregate and subgroup data included ANCOVAs with a cluster correction (See Table A-1 in the Appendix) 
(What Works Clearinghouse, 2014). Details are provided in subsequent sections of this report and in the 
appendices. 


Instrumentation 


The evaluation team elected to use the WestEd-developed Partnership for Standards-based Science 
Assessment (PASS at WestEd®) test as the primary measure of student learning. PASS is a standards- 
based test, developed with funding from the National Science Foundation (NSF). The work of PASS builds 
upon research on the properties of science assessment and current approaches for assessment 
development and scoring. For the purposes of this study, PASS was administered at the elementary and 
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middle school levels. It consists of three assessment components at each grade level: 1) selected- 
response/multiple choice items (hereafter referred to in this report as multiple choice or MC), 2) 
constructed response investigations (grades 3-5) or open-ended questions (grades 6-8 (hereafter referred 
to in this report as open-ended or OE), and 3) hands-on performance tasks (PT). All students in the study 
completed the MC component; the OE and PT sections were completed by a sub-group of students in 
focal schools. Table 1 provides a description of each PASS Assessment Component. 


Table 1: PASS Assessment Components 


PASS Assessment Component Description 


Selected Response or Multiple Choice Items (MC) | Items assess students’ understanding of important 
(29 items for both Elementary and Middle school) | scientific facts, concepts, principles, laws, and 


theories. 
Constructed Response Investigations and Open- Students analyze a problem, think critically, 
Ended Questions (OE) (2 items for Elementary conduct a secondary analysis, and apply learning. 
and 6 items for Middle school) They construct explanations using evidence. 
Hands-on Performance Tasks (PT) (6 items for Investigations identifying a problem to solve. 
both Elementary and Middle school) Students use equipment to perform investigations; 


make observations; generate, organize, and 
analyze data; communicate understandings; and 
apply learning. 


Measurement specialists on the PASS development teams conducted equating studies in the design of the 
forms and established the validity and reliability of the assessments. Table 2 below, obtained from 
WestEd, shows score reliabilities and inter-rater agreement calculated for the PASS assessments, including 
selected response/multiple choice items (MC), constructed response/open-ended investigations (OE), and 
hands-on performance tasks (PT). The reliabilities of the item calibrations are given by the Rasch 
equivalent of the Cronbach alpha statistic and are derived from the ratio of the spread of the items over 
the scale to their own root mean squared error (RMSE). The statistic is scaled to stretch from 0 to 1.0. 
Inter-rater agreement is the correlation between the first and second reader on each answer within a task. 


Table 2: PASS Score Reliabilities and Inter-Rater Reliability 


Overall Score Inter-Rater Agreement 
Number of Reliabilities (Constructed Response & 
Students (Cronbach Performance Tasks) 
Alpha) CR-1 CR-2 PT 
Grade 5: Administered at grades 3, 4,&5 7, 429 87 84 87 85 
Grade 8: Administered at grades 6, 7,&8 7,777 92 95 88 91 


To ensure that only students present at the beginning of LASER implementation would continue to be 
tested throughout the course of the study (and would therefore be considered as accurately representing 
their school in terms of science achievement), CREP researchers pre-slugged answer sheets following the 
baseline administration of the PASS MC component. Data from each administration were carefully 
monitored to ensure accuracy of the data set. 
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In both the elementary and middle school cohorts, there were four alternate forms of the PASS MC 
subtest administered over the course of the study: Fall 2011 (pretest), Spring 2012 (posttest), Spring 2013 
(posttest), and Spring 2014 (posttest). The three posttest forms were equated, separately by grade level 
band, with the Fall 2011 (base) form using common items and the Mean/Sigma method based on single 
parameter (item difficulty) IRT Rasch models estimated under the assumption of random items. As the 
OE and PT subtests for both grade band tests were the same on all test forms, no equating was required 
for those subtests. It should be noted that the OE and PT sections were administered beginning in Spring 
2012, so there were only three vs. the four administrations of the MC section. In addition to the 
equating, scaled scores (0-600 scale) were established for the MC subtest to enable analyses of outcomes 
across the different test forms. 


CREP recruited, trained and calibrated scorers for the PASS OE and PT components of the assessment 
using WestEd’s validated training and calibration materials and conducted an adaptation of WestEd’s 
training and calibration process starting with the Spring 2013 administration. Scorers first participated in 
a five-hour training session, focused on either elementary or middle school scoring, where the open- 
ended questions and performance tasks, along with their associated scoring rubrics, were presented and 
discussed. At the end of the training session and prior to scoring any of the actual PASS OE/PT student 
materials, participants independently scored a calibration set of eight OE/PT calibration booklets 
(validated and provided from WestEd). The participants’ results were compared to the 80% percent 
agreement benchmark set to ensure their readiness to become a certified scorer. 


After qualifying as certified scorers, each was randomly assigned to score student test booklets, based on 
the elementary or middle school training they had received. Each time a scorer received a set of 100 
booklets, three unidentified calibration booklets were randomly inserted into the set. When scorers 
returned each set of papers, CREP checked the calibration papers against the pre-determined scores. 
Scorers could not receive a new set of booklets unless they met or exceeded the benchmark calibration 
level. 


Certified scorers signed a non-disclosure agreement and kept booklets secure and confidential until they 
were scored and returned. They scored each question individually and recorded results on student scan 
documents. The same scorers returned to score the following year’s PASS OE/PT materials, and 
completed a second training and calibration process prior to scoring those assessments. 


Population and Context 


The population from which the sample for this study was drawn encompasses three regions (Houston 
Independent School District, rural North Carolina, and northern New Mexico), and represents a total of 
16 school districts. This population of school districts includes more than 325,000 students and 20,000 
teachers, and over 150 district and building level instructional leaders, with more than half of students 
(56.2%) identified as “economically disadvantaged” by the National Center for Educational Statistics 
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(NCES) based on free and reduced lunch status. From this total, schools were nominated for participation 
in the study, and from that nominated list the study sample was created. 


Although these regions, and schools within the regions, are very diverse, commonalities exist across the 
regions that stem from conversations, trends and initiatives taking place at the national level. These have 
had varying levels of impact on the teaching of science in elementary and middle schools, and therefore 
on the implementation of the LASER model during the course of this study. Most notable are the 
national debate around Common Core State Standards and associated testing; the Next Generation 
Science Standards (final draft released in April 2013); new teacher evaluation models; and identification 
and implementation of programs for low performing schools. 


Within each region, unique conditions also existed during the LASER implementation window that had 
potential to impact implementation of the model. In North Carolina, a program called Read to Achieve 
was adopted in July 2012 via a state budget act and became effective during the 2013-2014 school year. 
With mandated summer reading camps and possible retention imposed on students who did not achieve 
acceptable reading levels by third grade, more instructional time was devoted to the teaching of reading 
in the lower grades in many NC schools. In Houston, 11 of the LASER i3 schools (25.6%) were also part of 
that district’s Apollo 20 initiative, with multiple strategies in place to close gaps in achievement and more 
time devoted to testing than was required in other schools. Implementation in Northern New Mexico 
was impacted by the geography of the region: remote mountain school locations affected teacher travel 
to professional development sessions; delivery of materials, particularly live specimens; and access to 
SSEC’s regional support system. Partnership with the Los Alamos National Laboratory (LANL) Foundation 
provided support for schools in that area. 


Baseline data collected from teachers prior to LASER i3 implementation in fall 2011 revealed that most 
students in all three regions received science instruction from their classroom teachers rather than from 
a science specialist (reported by more than 90% of teachers in all three regions) and that these teachers 
did not major in science or science education (Approximately 10% of respondents from New Mexico and 
HISD and 8% from North Carolina reported holding majors in science). Science laboratories were more 
prevalent in Houston schools than in the other regions, with nearly half (49.8%) of teachers reporting 
that their students went to labs to receive science instruction compared to 10.5% in North Carolina and 
1.5% in New Mexico. Time devoted to the teaching of science was also greater in Houston, with teachers 
reporting an average of 3.3 hours per week of science instruction compared to slightly under 2.5 hours in 
North Carolina and slightly over 2 hours in New Mexico. 


When asked at baseline about challenges associated with teaching science in elementary and middle 
schools, teachers’ responses were consistent across all three regions. The greatest challenge (reported 
as “substantial” or “significant” by 70% of NM teachers, 63% of NC teachers, and 59% of HISD teachers) 
was limited time for science instruction. Limited funds for purchasing equipment and supplies and more 
emphasis on English/language arts and mathematics than science instruction were also reported 
challenges in all regions. 
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Sample 


The study sample originally included 139 schools across the three regions. These schools went through a 
nomination and qualification process, were matched based on several school-level demographic and 
achievement variables, and then randomly assigned to Phase 1 (immediate implementation) or Phase 2 
(delayed implementation). After random assignment, changes in school participation occurred within 
each region, and the final sample contained 125 study schools within the 16 districts and encompassed 
approximately 60,000 students, 1,900 teachers, and over 140 district administrators and principals. 
While LASER is a school-level intervention in which all students in the participating schools received the 
treatment, a subsample of 9,000 students in two cohorts (third and sixth graders in 2011-12) was 
followed longitudinally over the three years of the study, and a further subset of focal schools was 
randomly selected to participate in additional components of data collection. A breakdown of the study 
schools by region, phase, and focal status as well as a detailed description of selection methodology has 
been provided in previous annual reports. 


HISD is the largest school district in the study. Participating schools generally served Hispanic and African 
American populations (63.8% and 28.9%, respectively), with most students (88.1%) identified as eligible 
for free and reduced lunch. 


New Mexico LASER schools in the participating school districts ranged in size from 26 to 984 students. 
New Mexico districts served mostly Hispanic, White, and American Indian/ Alaskan populations (48.0%, 
34.8%, and 11.7%, respectively), with over half of students (58.0%) qualifying for free and reduced lunch 
status. 


The sizes of LASER schools in the participating school districts in North Carolina (NC) ranged from 186 to 
930 students. Most of the districts served primarily White and African American populations (60.7% and 
18.1%, respectively), with almost two-thirds of students (63.1%) identified as eligible for free or reduced 
lunch by the North Carolina Public Schools. 


Findings 


Although student achievement gains as measured by traditional standardized tests comprise only one 
component of a successful intervention, it is the single outcome of most interest to many constituencies. 
To obtain valid achievement outcome data, CREP researchers analyzed scores from only students in the 
elementary and middle school cohorts for whom both pretest and posttest (baseline and spring 2014) 
PASS scores were available and established baseline equivalence using these analytic samples. It is 
important to note that all schools identified as Phase 1 are considered to be in the “treatment” group 
regardless of their level of implementation of the LASER model, and that fidelity of implementation 
varied widely across regions and across schools within regions. It should also be noted that the statistical 
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analyses utilized an “Intent-to-Treat” model, where students were included in the Phase 1 or Phase 2 
groups for analysis based on their treatment status at the time of random assignment. 


Important and positive trends between Phase 1 and Phase 2 schools are evidenced in exploratory 
subgroup outcomes related to characteristics commonly agreed upon as most valued by employers. Both 
the OE and PT sections of the PASS call upon students to communicate their knowledge in written form. 
They also engage students in activities associated with critical thinking and problem solving. These 
twenty-first century skills are associated with college and career readiness; it is therefore noteworthy 
that these are the areas of achievement in which Phase 1 schools excelled. It is also important to note 
that the underserved populations of economically disadvantaged and special needs students, as well as 
those for whom English is a second language, seem to have benefited from their experiences with LASER 
as reflected in scores on the PASS. 


PASS results for all three regions combined follow, first for the MC, then the OE and PT sections. A summary 
of the key findings for each set of analyses is presented at the beginning of each section, followed by 
information on the samples included, baseline equivalence between the Phase 1 and Phase 2 groups, and the 
detailed outcomes by grade level (i.e., elementary cohort and middle school cohort) and subgroup. In keeping 
with guidelines in the most recent What Work Clearinghouse Procedures and Standards Handbook (Version 
3.0), a clustering correction was applied to all statistically significant findings, including secondary contrasts 
(i.e., exploratory analyses). In addition, the overall and differential cluster-level attrition rate for each 
outcome was calculated and compared to the WWC allowable standards. 


= Based on guidelines in the most recent What Work Clearinghouse Procedures and Standards 
Handbook (Version 3.0), guidelines in Reviewer Guidance for Use with the Procedures and Standards 
Handbook (version 3.0), consultation with representatives from the WWC (See Figure A-2 and 
Figure A-3 in the Appendix), and the fact that the elementary school study was an RCT with low 
cluster-level attrition, it should receive a WWC Study Rating of “Meets WWC Group Design Standards 
without Reservations.” While the middle school study was also an RCT, none of the outcomes met the 
WWC cluster-level attrition standards. For the OE and PT sections, this was due to issues of 
differential attrition related to Phase 2 schools (no Phase 1 middle schools were lost to attrition). 
However, all but two outcomes demonstrated baseline equivalence, meaning the middle school study 
should receive a WWC Study Rating of “Meets WWC Group Design Standards with Reservations.” 


There were three exploratory outcomes for the elementary study that were statistically significant and 
positive after the cluster correction: The ELL subgroup on PASS OE and PT, and the IEP subgroup on the PT. 
Furthermore, it should be noted that there were no statistically significant or substantively important 
confirmatory or exploratory negative findings for the elementary study. Therefore, these three exploratory 
outcomes for the elementary study in favor of Phase 1 schools, which remained statistically significant after 
the cluster correction, should be considered meaningful and positive findings. In addition, the IEP (g = 0.39) 
and ELL (g = 0.30) PT exploratory outcomes had substantively important effect sizes. According to the What 
Work Clearinghouse Procedures and Standards Handbook (Version 3.0): 
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For the WWC, effect sizes of 0.25 standard deviations or larger are considered to be 
substantively important. Effect sizes at least this large are interpreted as a qualified positive 
(or negative) effect, even though they may not reach statistical significance in a given study 


(What Works Clearinghouse, 2014, p. 23). 


Based on these statistically significant and substantively important findings in the elementary study the SSEC’s 
LASER model can claim success in meeting its goal of improving student achievement. 
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All Regions: 
Results for Spring 2014 PASS 
Multiple Choice 
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All Regions Spring 2014 PASS Multiple Choice Key Findings for Phase 1 


After applying the cluster correction, and calculating the overall and subgroup cluster-level attrition, 
there were no statistically significant or substantively important PASS scaled score outcomes favoring 
Phase 1 elementary schools on the Spring 2014 PASS multiple choice section. It should be noted that 
while none of the middle school analyses (overall or by subgroup) met the WWC cluster-level attrition 
standard, all outcomes demonstrated baseline equivalence using the analytic samples. For the 
elementary cohort, only the IEP subgroup did not meet the WWC cluster-level attrition standard, but the 
subgroup did demonstrate baseline equivalence with the analytic sample. 
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Fall 2011 to Spring 2014 PASS Results: 
All Regions 


There were a total of 29 multiple choice questions on both the Fall 2011 and Spring 2014 forms of the 
PASS (PASS MC) addressing five broad science content standard categories for the elementary cohort and 
six broad science content standard categories for the middle school cohort. Scaled scores on the PASS 
MC for both elementary and middle schools range from 0-600. Only students who answered at least one 
multiple choice achievement question at both time points were included in the analyses for each 
respective area of analysis. 


PASS Multiple Choice: All Regions 
Table 3 shows the final cluster (i.e., school) and student sample sizes employed in the elementary cohort 


analyses (5" graders in 2013-2014) once students missing data on all 29 PASS MC questions at either time 
point were excluded. 


Table 3: PASS MC, Spring 2014: School and Student Samples for the PASS MC Analyses for the Elementary 
Cohort: All Regions 


Sample Phase 1 Phase 2 
Schools available for the PASS MC achievement analysis 51 43 
Students available for the PASS MC achievement analysis 2,338 1,785 


Table 4 shows the final school and student sample sizes employed in the middle school cohort analyses 
(8" graders in 2013-2014) once students missing all 29 PASS MC questions at either time point were 
excluded. 


Table 4: PASS MC, Spring 2014: School and Student Samples for the PASS MC Analyses for the Middle 
School Cohort: All Regions 


Sample Phase 1 Phase 2 
Schools available for the PASS MC achievement analysis 11 11 
Students available for the PASS MC achievement analysis 1,036 1,132 


To determine baseline equivalence on the Fall 2011 PASS MC scaled scores between Phase 1 and Phase 2 
for elementary and middle schools included in the present analysis, a series of independent t-tests were 
conducted using the analytic samples for all elementary and middle schools in the aggregate as well as 
for subgroups of designated by their Special Education (IEP) status, English Language Learner (ELL) status, 
Economically Disadvantaged (FRL) status, and Gender. In addition, an effect size was also calculated as a 
measure of baseline equivalence. 


As an indicator of the impact or “practical significance” of the treatment, the “effect size” (calculated as 
Hedges’ g) is a descriptive statistic that indicates the magnitude of the difference (in standard deviation 
units) between two measures. For example, a positive effect size would indicate a higher (i.e., better) 
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Phase 1 mean, while a negative effect size would indicate a higher (i.e., better) Phase 2 mean. Based on 
guidelines from the What Works Clearinghouse (WWC), a unit within the research division of the U.S. 
Department of Education, an effect size of +/- 0.25 is considered to be “substantively important” (What 
Works Clearinghouse, 2014). 


With respect to the elementary cohort (Table 5), in the aggregate (the “All” group), there was no 
statistically significant difference by Phase between schools in baseline achievement levels (t (4121) = - 
0.75, p =.45, g = -0.02, PR = 49) based on the analytic samples. At the same time, the ELL subgroup was 
the only one that demonstrated a statistically significantly difference in baseline achievement, with Phase 
2 ELL schools outperforming their Phase 1 counterparts, although based on the effect size (g), not toa 
substantively meaningful degree (t (926.9) = -2.36, p = .02, g =-0.15, PR = 44). Overall, there were no 
substantively important effect size differences for the elementary cohort schools, meaning there was 
baseline equivalence for all groups based on the analytic samples. 


Table 5: Baseline Comparison of Fall 2011 PASS MC Scaled Scores for Elementary Cohort Phase 1 
(Treatment) and Phase 2 (Control) Schools (N = 94): All Regions 


Treatment (Phase 1) Control (Phase 2) 
Grou 
Pp School Student M sp School Student M sp 
n n n n 
Elementary Cohort 
All st 2,338 312.02 = 101.33 43 1,785 314.39 98.11 : 0.02 49 


IEP 42 209 267.05 93.13 30 152 260.83 98.96 0.06 53 
ELL 45 537 263.82 91.57 38 418 277.31 84.24 Z 0.15 44 


FRL 47 1,416 284.79 94.85 40 1,060 289,39 90.29 : 0.05 48 


Female ol 1,157 310.55 97.49 43 887 311.97 96.89 : 0.01 49 


Note: PR = The percentile rank of the average Phase 1 student in the control group based on the effect size (g). For example, if the PR is 60, 
then the average Phase 1 student scored at the 60th percentile of the control group. 

Note: PASS MC scaled scores range from 0-600. 

*p<.05. 


Likewise, with respect to schools in the middle school cohort (Table 6), there was no statistically 
significant difference between schools in baseline achievement by Phase (t (2166) = 1.17, p =.24, g=0.05, 
PR = 52) in the aggregate based on the analytic samples. When the outcomes for the FRL subgroup were 
compared by Phase, there was a statistically significant difference in Fall 2011 PASS scores that favored 
Phase 1 schools, but the effect size linked to the comparison did not meet WWC criteria for substantive 
importance (i.e., g = 0.25) (t (1223.2) = 3.62, p< .01, g = 0.20, PR = 58). On the other hand, there was a 
Statistically significant difference in Fall 2011 PASS scores for the ELL subgroup, and the effect size 
associated with the difference met the WWC threshold for substantive importance, favoring Phase 1 
schools (t (181) = 3.30, p < .01, g = 0.49, PR = 69). Therefore, the outcome for the ELL subgroup 
comparison for the middle school cohort should be interpreted in light of the substantively important 
difference in baseline achievement between Phase 1 and Phase 2 schools. 

Employing these Fall 2011 data as covariates to statistically adjust the outcomes for baseline differences in 
achievement for the analytic sample, analyses were conducted on Spring 2014 PASS MC scaled scores to 
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determine differences between Phase 1 and Phase 2 elementary and middle schools, with scaled scores on 
the Spring 2014 PASS MC used as the outcome measure. 


Table 6: Baseline Comparison of Fall 2011 PASS MC Scaled Scores for Middle School Cohort Phase 1 
(Treatment) and Phase 2 (Control) Schools (N = 22): All Regions 


Treatment (Phase 1) Control (Phase 2) 
Grou 
p ae pipes M sp School Student M SD 
Middle School Cohort 

All 11 1,036 364.51 102.66 11 1,132 359.10 112.40 117 0.05 52 

IEP 10 111 282.22 96.57 6 114 276.25 112.07 0.43 0.06 52 

ELL 10 83 290.70 82.07 11 100 248.08 90.75 3.30* 0.49 69 

FRL ll 644 339.08 98.64 11 614 317.63 110.88 3.62* 0.20 58 
Female ll 531 367.68 99.85 11 562 357.67 108.00 159 0.10 54 


Note: PR = The percentile rank of the average Phase 1 student in the control group based on the effect size (g). For example, if the PR is 60, 
then the average Phase 1 student scored at the 60th percentile of the control group. 

Note: PASS MC scaled scores range from 0-600. 

*p<.05. 


Elementary and Middle School Cohort PASS Multiple Choice Analyses: All Regions 
With respect to the cohort of elementary schools in Phase 1 (n = 51) and Phase 2 (n = 43) and the cohort 


of middle schools in Phase 1 (n = 11) and Phase 2 (n =11), a set of ANCOVA analyses intended to 
generate pairs of adjusted scaled score means and to compute the treatment effect sizes (g) were 
conducted on the PASS MC outcomes for all elementary and middle schools by Phase within cohort, as 
well as for subgroups, categorized by their IEP status, ELL status, FRL status, and Gender (see Table 7 and 
Table 8). 


Elementary Cohort PASS Multiple Choice Spring 2014 Results: All Regions 
For the elementary cohort schools across the three regions, while the overall (i.e., the “All” group) 


ANCOVA adjusted scaled score mean presented in Table 7 was higher for Phase 1 schools (n = 51, 
Adjusted Mean = 435.80) compared to Phase 2 schools (n = 43, Adjusted Mean = 434.88), it also fell short 
of being statistically significant (F (1, 4116) = 0.15, p = 0.698, g = 0.01, PR = 50), and the effect size (g = 
0.01) was not substantively important. Consistent with these overall outcomes, two subgroup analyses 
(IEP and ELL) were linked to positively signed effect sizes that favored Phase 1 schools in the elementary 
cohort (see Table 7). Meanwhile, while the ELL subgroup in Phase 2 schools statistically significantly 
outperformed the ELL subgroup in Phase 1 schools at baseline, the ELL subgroup in Phase 1 schools had a 
higher adjusted mean scaled score on the posttest that ultimately fell short of being statistically 
significant or substantively important. Overall, none of the effect sizes for the ANCOVA analyses were 
large enough to be substantively important, ranging from a low of -0.03 (FRL and Female) to a high of 
0.19 (IEP). 
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Table 7: PASS MC, Spring 2014: Subgroup Mean Scaled Score Comparison for Elementary Cohort Phase 1 
(Treatment) and Phase 2 (Control) Schools (N = 94): All Regions 


Treatment (Phase 1) Control (Phase 2) 
School Student School Student 
ergot) n n M SD M SD 
All 51 2,338 435.28 88.72 435.80 43 1,785 435.56 88.76 43488 015 0.698 0.01 50 
IEP 42 209 392.08 104.38 390.08 30 152 366.49 116.11 369.23, 3.86 = 0.050 Ss 0.19 S58 
ELL 45 537 402.87 100.18 405.64 38 418 403.77. 10485 400.21 080 0370 0.05 52 
FRL 47 1,416 415.28 94.98 415.85 40 1,060 419.38 94.50 418.62 066 0.416 -0.03 49 
Female 51 1,157 434.92 84.14 435.33 43 887 438.12 84.10 437.59 050 0.481 -0.03 49 


Note: PR = The percentile rank of the average Phase 1 student in the control group based on the effect size (g). For example, if the PR is 60, 
then the average Phase 1 student scored at the 60th percentile of the control group. 

Note: PASS MC scaled scores range from 0-600. 

*p<.05 


Middle School Cohort PASS MC Spring 2014 Results: All Regions 
For the schools across the three regions in the middle school cohort, unlike the outcomes observed for 


the elementary cohort, the overall scaled score performance result for the ANCOVA analysis (i.e., the 
“All” group) shown in Table 8 was negative for middle school cohort Phase 1 schools (n = 11, Adjusted 
Mean = 323.02) compared to middle school cohort Phase 2 schools (n = 11, Adjusted Mean = 327.22), 
and was not statistically significant (F (1, 2161) = 1.35, p =0.246). In addition, the effect size (g = -0.04, 
PR = 48) was not substantively important. Furthermore, despite advantage of Phase 1 schools on the 
pretest for all subgroups, including a substantively important advantage of Phase 1 schools on the Fall 
2011 baseline for the ELL subgroup, Phase 2 outperformed Phase 1 for all subgroups. However, the 
effect size favoring the Phase 2 IEP subgroup (g = -0.28) was the only substantively important subgroup 
effect found, and indicated that the average Phase 1 student scored at the 39" percentile of the control 
group. Furthermore, none of the outcomes was statistically significant, including the outcome for the IEP 
subgroup, which was not statistically significant after applying the cluster correction. 


Table 8: PASS MC, Spring 2014: Subgroup Mean Scaled Score Comparison for Middle School Cohort Phase 1 
(Treatment) and Phase 2 (Control) Schools (N = 22): All Regions 


Treatment (Phase 1) Control (Phase 2) 
School Student School Student 

eget) ] ] We SD Wi SD 
Al Ad 086_ 323.75 100.85 323.02 3232655 106.00 327.22 135 0.246 0.0448 
IEP 10 111 220.02 124.36 217.99 6 114 250.53 124.45 = 252.50 5.91  0.016* 0.196  -0.28 39 
ELL 10 83 235.75 107.78 224.46 11 100 232.97 106.64 242.34 1.39 0.240 0.17 43 
FRL 11 644 299.72 111.57 293.04 11 614 293.58 109.45 300.58 2.25 0.134 0.07 47 
Female 11 531 333.00 10439 330.93 11 562 330.84 99.39 332.80 0.16 0.690 -0.02 49 


Note: PR = The percentile rank of the average Phase 1 student in the control group based on the effect size (g). For example, if the PR is 60, 
then the average Phase 1 student scored at the 60th percentile of the control group. 
Note: PASS MC scaled scores range from 0-600. 
*p<0.05. 
p*= Clustering-corrected statistical significance 
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All Regions: 
Results for Spring 2014 PASS 
Open-Ended and Performance Task 
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All Regions Spring 2014 PASS Open-Ended and Performance Task 
Key Findings for Phase 1 


After applying the cluster correction, and calculating the overall and subgroup cluster-level attrition, for 
all students combined (the “All” group) and the specified subgroups, the following outcomes (percentage 
correct) favoring Phase 1 elementary and middle schools were found on the Spring 2014 PASS Open- 
Ended/Constructed Response (OE), and Performance Task (PT) sections. It should be noted that while 
none of the middle school analyses (overall or by subgroup) met the WWC cluster-level attrition standard 
(due to differential attrition rates related to Phase 2 schools, as no Phase 1 schools were lost to attrition), 
all outcomes demonstrated baseline equivalence with the analytic samples except for the ELL subgroup 
on the Open-Ended section. For the elementary cohort, only the IEP subgroup did not meet the WWC 
cluster-level attrition standard, but the subgroup did demonstrate baseline equivalence with the analytic 
sample. 


ELL 


Elementary Cohort Open-Ended: Phase 1 schools had statistically significantly higher achievement than 
Phase 2 schools after the cluster correction. 


Elementary Cohort Performance Task: After controlling for the statistically significant advantage Phase 2 
schools demonstrated on the pretest (g = -0.18), Phase 1 schools demonstrated a statistically significant 
advantage over Phase 2 schools after the cluster correction that was also substantively important (g = 
0.30). 


Middle School Cohort Performance Task: Phase 1 schools had a substantially important advantage over 
Phase 2 schools on the posttest (g = 0.37). 


Economically Disadvantaged (FRL) 


Middle School Cohort Performance Task: Phase 1 schools outperformed Phase 2 schools with an effect 
size that was substantively important (g = 0.27). 


IEP 


Elementary Cohort Performance Task: Phase 1 schools demonstrated statistically significantly higher 
achievement than Phase 2 schools after the cluster correction. In addition, the effect size was 
substantially important (g = 0.39). 


Female 


Middle School Cohort Performance Task: Phase 1 schools outperformed Phase 2 schools with a nearly 
substantively important effect size (g = 0.23). 
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Spring 2014 PASS Open-Ended and Performance Task Results: 
All Regions 


Introduction 


A random sample of schools in the three regions took the PASS Open-Ended and Performance Task 
assessment for the first time in Spring 2012, (end of first posttest year) and again in Spring 2013 and Spring 
2014 (second and third posttest years, respectively). Students in the elementary cohort (5'" graders in 
2013-2014) responded to two Open-Ended (OE) and six Performance Task (PT) items, while middle schools 
(8" graders in 2013-2014) responded to six OE and six PT items. It should be noted that a random sample 
of schools in the HISD middle school cohort took the OE and PT sections for the first time in Spring 2013, 
and are therefore not included in these analyses. 


PASS Open Ended and Performance Task Scoring 


For the elementary cohort, there are a total of six points possible for the OE section and 17 total points 
possible for the PT section. For the middle school cohort, there are a total of 15 points possible for the 
OE section and 17 total points possible for the PT section. The items are scored using a rubric, with the 
number of points available for each item in each section shown in Table 9 below. In order to score a 
section, the student had to answer at least one item (i.e., gave a response that received a score of zero or 
higher). Otherwise, the section was dropped from the analysis if all the items were either missing, scored 
a “B” (blank), or had a combination of missing data and scores of “B”. If the section was scored, any item 
with a “B” and any missing items were given a value of zero. As a result, when a section was scored and a 
student had missing items or items scored with a “B”, those items were treated the same as the case 
where a student actually responded to an item, but received a score of zero, indicating the response did 
not contain any correct elements or was irrelevant. For both the OE and PT sections, the outcome score 
used in the analyses was the percentage correct out of the total number of points possible. 


Table 9: PASS OE and PT Scoring Scales, Spring 2012, Spring 2013, and Spring 2014 
Elementary Cohort Middle School Cohort 


Performance Task Open-ended Question 


Open-ended Question Performance Task 


1 B,0,1,2,3 1 B, 0, 1,2,3 1 B,0,1,2 1 B, 0, 1,2,3 

Se eee so ee ee 
pe a 

4 | B0123 | 4 «| B0,1,2,3 4 B,0,1,2,3 

5 | B0,1,23 | 23 5 B, 0, 1,2,3 
Er motos et hr 

sa 6 cars 7 pala 15 pas 7 
B= Bank 


Asummary of the Key Findings for each set of analyses is presented at the beginning of the report, followed 


by information on the samples included, baseline equivalence between the Phase 1 and Phase 2 schools, and 
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the detailed outcomes by grade level (i.e., elementary cohort and middle school cohort), outcome (PASS OE 
and PASS PT) and subgroup. 


A preliminary analysis was conducted on the Spring 2012 OE and PT sections of the PASS for students in 
the analytic sample who had a Spring 14 OE or PT percent correct score to determine baseline equivalence 
between Phase 1 and Phase 2 for elementary and middle schools included the present analysis (see Table 
10) as the PASS OE and PT sections were not administered until the end of the first posttest year, meaning 
there was no Fall 2011 baseline scores available. In addition, an effect size was also calculated as a 
measure of baseline equivalence. 

As an indicator of the impact or “practical significance” of the treatment, the “effect size” (calculated as 
Hedges’ g) is a descriptive statistic that indicates the magnitude of the difference (in standard deviation 
units) between two measures. For example, a positive effect size would indicate a higher (i.e., better) 
Phase 1 mean, while a negative effect size would indicate a higher (i.e., better) Phase 2 mean. Based on 
guidelines from the What Works Clearinghouse, a unit within the research division of the U.S. Department 
of Education, an effect size of +/- 0.25 is considered to be “substantively important” (What Works 
Clearinghouse, 2014). 


Results indicated that for the elementary cohort aggregate scores (i.e., for all students combined), there 
was no statistically significant difference between Phase 1 and Phase 2 schools on the Spring 2012 OE or 
PT percent correct, along with no substantially important effect sizes according to What Work 
Clearinghouse (WWC) standards. For the middle school cohort aggregate scores, Phase 1 schools had a 
Statistically significantly higher mean Spring 2012 OE percent correct, as well as Spring 2012 PT percent 
correct, with the magnitude of the effects for both being substantially important. 


Table 10: PASS OE and PT, Spring 2012, Treatment (Phase 1) and Control (Phase 2) School Percent Correct 
Means Comparison: All Regions 


Treatment (Phase 1) Control (Phase 2) 
School Student School Student 

BSY=Ye1 (0) 1) n n 7] 

Open-Ended Elementary 35 1,159 43.3 20.66 31 991 44,43 18.69 -1.37  -0.06 
Performance Task Elementary 35 1,326 53.68 19.76 32 1,099 54.41 17.35 -0.97 — -0.04 

Open-Ended Middle School 8 795 72.6 16.32 7 578 68.06 19.43 4.56* 0.26 
Performance Task Middle School 8 697 52.11 20.08 7 514 42.23 23.45 7.69% 0.46 
* 
p<0.05 


Due to the fact that the PASS OE and PT were not administered until the end of the first posttest year, 
meaning there were no true baseline scores available, and due to substantively meaningful differences 
on the Spring 2012 scores, correlation analyses were conducted to examine the relationship between the 
Spring 2014 PASS OE and PT percent correct and (1) the Spring 2012 PASS OE and PT percent correct, as 
well as (2) the Fall 2011 PASS Multiple Choice (MC) scaled score results, to determine which scores would 
serve as the better baseline measure of achievement. The analyses revealed statistically significant, but 
low correlations among each of the measures of achievement (see Table 11). For the schools in both the 
elementary and middle school cohorts, the Fall 2011 PASS MC scaled scores had higher statistically 
significant correlations with the Spring 2014 PASS OE and PT, compared to the Spring 2012 OE and PT. 
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Table 11: Correlations on the Percent Correct for Spring 2014 PASS OE and PT with Spring 2012 PASS OE 
and PT, and Fall 2011 PASS Multiple Choice for Phase 1 and Phase 2 Schools: All Regions 


Fall 2011 PASS Spring 2012 Spring 2012 
Spring 2014 PASS Multiple Choice Open-Ended Performance Task 
F _Elementary 0.37" 0.33* ns NA 
a caee cellaias Middle School 0.45* 0.38* NA 
Elementary 0.36" NA Ses 0.35* 
Spring 2014 Performance Task Middle School 0.39" NA 0.34" 


*p<0.05 


To determine baseline equivalence on the Fall 2011 PASS MC scaled score between the analytic samples 
in Phase 1 and Phase 2 elementary and middle schools included the present analyses, a series of 
independent t-tests were conducted for all elementary and middle schools in the aggregate as well as for 
subgroups identified by their Special Education (IEP) status, English Language Learner (ELL) status, 
Economically Disadvantaged (FRL) status, and Gender (see Table 12). For the analytic sample in the 
aggregate elementary OE cohort (i.e., the “All” group), Phase 2 schools demonstrated a statistically 
significant advantage over Phase 1 schools in their baseline achievement levels (t(2583) = -2.53, p = 
0.011, g = -0.10, PR = 46), but the effect size linked to this advantage did not meet WWC criteria for 
substantive importance (i.e., g 2 0.25). In addition to this overall difference in performance, a statistically 
significant, but not substantively important advantage was also observed to favor the analytic sample in 
the Female subgroup in Phase 2 schools in the elementary cohort. 


For schools in the middle school OE cohort, no statistically significant difference in aggregate 
performance (i.e., the “All” group) between the analytic sample in Phase 1 and Phase 2 schools was 
observed (t(1525) = 1.02, p = 0.309, g = 0.05, PR = 52), and the associated effect size did not meet the 
WWC criteria for substantive importance. Meanwhile, a statistically significant, but not substantively 
important advantage in baseline performance for the analytic sample was observed for the FRL subgroup 
in Phase 1 middle schools. Additionally, the ELL subgroup in Phase 1 schools had an advantage over 
Phase 2 schools that was not statistically significant, but was substantively important (g = 0.31). 


With respect to the elementary PT cohort in the aggregate (i.e., the “All” group), Phase 2 schools 
demonstrated a statistically significant advantage over Phase 1 schools in their baseline achievement 
levels for the analytic sample (t(2599) = -2.20, p = 0.028, g = -0.09, PR = 47), but the effect size linked to 
this advantage did not meet WWC criteria for substantive importance (i.e., g = 0.25). Consistent with this 
overall difference in performance, a statistically significant, but not substantively important advantage 
was observed to favor the ELL subgroup in Phase 2 schools. 


With respect to the middle school PT cohort, no statistically significant difference in aggregate 
performance (i.e., the “All” group) between the analytic samples in Phase 1 and Phase 2 schools was 
observed (t(1406) = -0.55, p = 0.582, g = -0.03, PR = 49), and the associated effect size did not meet the 
WWC criteria for substantive importance. No statistically significant or substantively important 
advantages in baseline performance were observed for the analytic samples in any of the four subgroups 
for middle schools. While neither the Spring 2012 PASS OE and PT nor the Fall 2011 PASS Multiple Choice 
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provided complete baseline equivalence between Phase 1 and Phase 2 schools, the Fall 2011 PASS 
Multiple Choice was administered as a true baseline assessment vs. the Spring 2012 PASS OE and PT, 
which was not administered until the end of the first posttest year. Therefore, due to its stronger 
relationship to the Spring 2014 PASS OE and PT outcomes, and because it was a true baseline measure, 
the Fall 2011 PASS Multiple Choice scaled score was chosen as the covariate (i.e., pretest measure) for 
both the elementary and middle school cohort analyses. 
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Table 12: Fall 2011 PASS Multiple Choice, Treatment (Phase 1) and Control (Phase 2) School Mean Scaled 
Score Comparison: All Regions 
Treatment (Phase 1) Control (Phase 2) 
School Student Schoo! Student 
Group n n M 
____Elementary Cohort - Open-Ended __ 
All 35 1,409 306.03 100.39 


316 98.81 
IEP 28 133 265.75 92.62 258.42 102.86 
ELL 32 370 263.68 92.01 7 277.96 86.41 
FRL 32 890 280.08 95.69 59 287.52 89.2 

Female 35 693 302.23 94.44 584 313.05 97.34 
Treatment (Phase 1) Control (Phase 2) 
School Student School Student 
n n M SD) n M SYD) 
__ Middle School Cohort - Open-Ended __ = i = ; 
832 368.45 103.45 


i) 
ee 


DOT NM 
00} CO 


(ee) 
rag 


py 


All 


~NI 


695 362.84 111.7 


IEP 83 275.46 90 6 71 269.59 120.65 
ELL 44 274.25 80.46 7 64 246.67 91.15 


8 
7 
7 
8 
8 


FRL 490 338.92 99.53 7 380 318.23 108.71 

Female 434 372.06 99.2 
Treatment (Phase 1) Control (Phase 2) 
School Student School Student 
Group n n M SD) n n M SD 
Elementary Cohort - Performance Task 

Al 351,429 308.05 101.59 

IEP 28 132 266.52 91.6 22 94 254.68 100.22 

ELL 32 371 263.52 92.41 

FRL 32 895 280.24 95.46 
Female 35 703 303.87 94,93 32 581 314.23 96.22 

Treatment (Phase 1) Control (Phase 2) 
School Student School Student 
Group n n M n n M SD 
_ lle School Cohort - Pe ‘ask 

All 8 772 365.64 0.55 -0.03 49 

IEP 7 84 271.93 0.53 

ELL 7 42 274.98 82.04 7 50 259.06 91.01 0.87 0.08 57 

FRL 8 338.95 1.76 
Female 8 405 368.36 102.13 7 328 362.54 105.82 0.75 0.06 52 


*p<0.05 
Note: PASS MC scaled scores range from 0-600. 


Employing the Fall 2011 PASS MC data as a covariate to statistically adjust the outcomes for baseline 
differences in achievement, preliminary analyses were conducted on Spring 2014 PASS OE and PT percent 
correct scores to determine any differences between Phase 1 and Phase 2 elementary and middle schools. 
As noted earlier, for the elementary cohort, there were statistically significant differences between Phase 
1 and Phase 2 schools on the baseline measures for both the OE and PT, with Phase 2 schools having an 
advantage both overall and for several subgroups. For the middle school cohort, the Phase 1 ELL subgroup 
had a substantively important advantage and the FRL subgroup had a statistically significant advantage on 
the OE section. Due to these baseline differences, results for these particular groups should be 
interpreted with these advantages in mind. 


SSEC i3 Validation Final Report of Confirmatory and Exploratory Analyses 22 


Elementary and Middle School Cohorts PASS Open-Ended Analyses: All Regions 


A set of ANCOVA analyses intended to generate pairs of adjusted percentage correct scores and to 
compute the treatment effect sizes (g) was conducted by Phase within cohort on the PASS OE outcomes 
for all elementary (Phase 1 n = 35, Phase 2 n = 31) and middle schools (Phase 1 n= 8, Phase 2 n= 7), as 
well as for subgroups, categorized by their Special Education (IEP) status, English Language Learner (ELL) 
status, Economically Disadvantaged (FRL) status, and Gender. 


Elementary Cohort Spring 2014 PASS Open-Ended Results: All Regions 


For the elementary cohort schools across the three regions, after applying the cluster correction, the 
ANCOVA adjusted means presented in Table 13 indicated no statistically significant difference between 
Phase 1 and Phase 2 schools overall (i.e., the “All” group) (F (1, 2578) = 6.32, p* = 0.240, g = 0.09, PR = 
54). In addition, the magnitude of the effect size for the All group (g = 0.09) was not considered to be 
substantively important. It should be noted that on the pretest for this group, Phase 2 schools had a 
Statistically significant advantage over Phase 1 schools, although it was not considered substantively 
important. Only the ELL subgroup demonstrated a statistically significant difference after the cluster 
correction, which favored Phase 1 schools (F (1, 610 = 6.70, p* = 0.043, g = 0.20, PR = 58). Furthermore, 
even though Phase 2 schools had an advantage on the pretest overall and for all but the IEP subgroup, 
after controlling for pretest differences, Phase 1 schools outperformed Phase 2 schools on the posttest 
for all groups, although no posttest effect size was substantively important. 


Table 13: PASS Open-Ended Questions, Spring 2014: Mean Percent Correct Comparison of Phase 1 
(Treatment) and Phase 2 (Control) Elementary Cohort Schools (N = 66): All Regions 


Treatment (Phase 1) Control (Phase 2) 
School School School School 
Group n M SD Adj. M n SD 
All 35 1,409 65.87 21.09 66.39 31 1,176 65.12 20.18 64.50 6.32 0.012* 0.240 0.09 54 
IEP 28 133 56.27 22.99 55.97 21 90 51.48 22.16 51.92 191 0.168 0.18 57 
ELL 32 370 61.89 22.38 62.38 28 247 58.77 20.54 58.05 6.70 0.010* 0.043* 0.20 58 
FRL 32 890 63.43 21.69 63.69 28 659 61.51 20.35 61.15 6.34 0.012* 0.134 0.12 55 
Female 35 693 67.68 20.07 68.21 31 584 67.01 19.47 66.37 3.22. 0.073 0.09 54 
OSs me ee eee Ce 


p= Clustering-corrected statistical significance 


Middle School Cohort Spring 2014 PASS Open-Ended Results: All Regions 


Across the middle schools in the three regions, the ANCOVA adjusted means presented in Table 14 
indicated no statistically significant difference between Phase 1 and Phase 2 schools overall (i.e., the “All” 
group) (F (1, 1520) = 0.51, p = 0.477, g = 0.03, PR = 51). While no subgroup comparison was statistically 
significant, the ELL subgroup produced an effect size that was substantively important favoring Phase 2 
schools (g = -0.32) (with the ELL subgroup in Phase 1 schools having an advantage (g = 0.31) on the 
pretest). 
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Table 14: PASS Open-Ended Questions, Spring 2014: Mean Percent Correct Comparison of Phase 1 
(Treatment) and Phase 2 (Control) Middle School Cohort Schools (N = 15): All Regions 


Treatment (Phase 1) Control (Phase 2) 
School Student Adj. School Student 
n n SD 
All 8 832 85.32 15.49 85.08 7 695 84.32 15.32 84.60 0.51 0.477 0.03 51 
IEP 7 83 70.92 20.04 70.28 6 71 69.48 23.55 70.24 0.00 0.992 0.00 50 
ELL 7 44 66.67 19.40 66.25 if 64 72.40 20.12 72.68 2.77 (0.099 -0.32 37 
FRL 8 490 82.91 16.74 63.69 re 380 80.02 17.01 61.15 1.09 0.297 0.15 56 
Female 8 693 67.68 20.07 68.21 7 584 67.01 19.47 66.37 0.36 0.549 0.09 54 
*p<0.05. 


Elementary and Middle School Cohorts PASS Performance Task Analyses: All Regions 


A set of ANCOVA analyses intended to generate pairs of adjusted percentage correct scores and to 
compute the treatment effect sizes (g) was conducted by Phase within cohort on the PASS PT outcomes 
for all elementary (Phase 1 n = 35, Phase 2 n = 32) and middle schools (Phase 1 n= 8, Phase 2 n = 7), as 
well as for subgroups categorized by their Special Education (IEP) status, English Language Learner (ELL) 
status, Economically Disadvantaged (FRL) status, and Gender. 


Elementary Cohort Spring 2014 PASS Performance Task Results: All Regions 
For the elementary cohort schools across the three regions, the ANCOVA adjusted means presented in 


Table 15 demonstrate no statistically significantly difference between Phase 1 and Phase 2 schools 
overall (i.e., the “All” group) after applying the cluster correction (F (1, 2594) = 6.28, p* = 0.355, g = 0.09, 
PR = 54), indicating that the average Phase 1 student scored at the 54" percentile of the control group. 
In addition, the effect size was not considered to be substantively important according to WWC 
standards. On the other hand, both the IEP (F (1, 219) = 10.16, p” = 0.014, g = 0.39, PR = 65) and ELL (F 
(1, 602) = 15.54, p* = 0.039, g = 0.30, PR = 62) subgroups in Phase 1 schools demonstrated statistically 
significant outcomes after the cluster correction. Furthermore, effect sizes for both subgroups were 
substantively important. The IEP subgroup, however, did not meet the WWC cluster-level attrition 
standard, but did demonstrate baseline equivalence with the analytic sample. 


Meanwhile, even though Phase 2 schools had a statistically significant advantage on the pretest overall 
and for the ELL subgroup, for both groups, after controlling for statistically significant pretest differences 
(All, g = -0.09, and ELL, g = -0.18), there was no statistically significant difference between Phase 1 and 
Phase 2 schools on the posttest. Furthermore, both the IEP (g = 0.39) and ELL (g = 0.30) subgroups had 
substantively important posttest effect sizes. In addition, after controlling for the advantage of the Phase 
2 FRL (g = -0.09) and Female (g =-0.11) subgroups on the pretest, the Phase 1 FRL and Female subgroups 
were able to demonstrate small, but positive effect sizes on the posttest (g = 0.14 and g = 0.06 
respectively). 
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Table 15: PASS Performance Task Questions, Spring 2014: Mean Percent Correct Comparison of Phase 1 (Treatment) and Phase 2 (Control) 
Elementary Cohort Schools (N = 67): All Regions 


Treatment (Phase 1) 


Control (Phase 2) 


RYasTole) Student RYasTole)| Student 
M Ry) Adj. M n n Ky?) 
All 35 1,429 66.16 15.50 66.55 32 1,172 65.56 16.82 65.09 6.28 0.012* 0.355 0.09 54 
IEP 28 132 60.29 16.26 59.96 22 94 52.25 21.16 52.72 10.16 0.002* 0.014* 0.39 65 
ELL 32 371 63.20 14.85 63.55 29 238 59.17 17.95 58.63 15.54 <0.001* 0.039* 0.30 62 
FRL 32 895 63.75 15.47 63.95 29 654 61.93 17.56 61.66 8.37 0.004* 0.157 0.14 56 
Female 35 703 67.15 14.82 67.57 32 581 67.22 16.26 66.70 1.18 0.278 0.06 52 
*D<0.05. 


p= Clustering-corrected statistical significance 
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Middle School Cohort Spring 2014 PASS Performance Task Results: All Regions 
Across the middle schools in the three regions, the ANCOVA adjusted means presented in Table 16 


demonstrate no statistically significant difference between Phase 1 and Phase 2 schools overall (i.e., the 
“All” group) after applying the cluster correction (F (1, 1401) = 19.09, p <0.001, p* = 0.516, g = 0.12, PR= 
58), indicating that the average Phase 1 student scored at the 58" percentile of the control group. In 
addition, the effect size was not considered to be substantively important according to WWC standards. 
Furthermore, while there were no statistically significant differences for any subgroups after applying 
the cluster correction, for the All and IEP subgroups, after controlling for a Phase 2 advantage on the 
pretest (g = -0.55 and g = -0.53 respectively), Phase 1 schools outperformed Phase 2 schools on the 
posttest (g = 0.12 and g = 0.19 respectively). Meanwhile, although Phase 1 schools had advantages on 
the pretest for three additional subgroups (ELL, FRL, and Female) that were neither statistically 
significant nor substantially important (ELL, g = 0.08, FRL, g = 0.13, and Female, g = 0.06) the Phase 1 
advantages were even stronger on the posttest with substantially important effects for the ELL (g = 0.37) 
and FRL subgroups (g = 0.27), and nearly substantially important effects for the Female subgroup (g = 
0.23). 
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Table 16: PASS Performance Task Questions, Spring 2014: Mean Percent Correct Comparison of Phase 1 (Treatment) and Phase 2 (Control) Middle 
School Cohort Schools (N = 15): All Regions 


Treatment (Phase 1) Control (Phase 2) 


Al 8 772 58.64 24.49 58.81 7 636 53.95 23.01 53.74 19.09 <0.001* 0.516 0.12 58 

IEP 7 84 40.97 20.43 41.02 5 61 37.32 19.65 37.25 1.28 0.260 0.19 57 

ELL 7 42 45.10 20.45 44.48 7 50 36.94 18.45 37.21 3.46 0.066 0.37 65 

FRL 8 A465 55.75 23.31 55.26 7 338 48.45 21.70 49.13 17.19 <0.001* 0.170 0.27 61 

Female 8 405 61.68 24.33 61.40 7 328 55.69 22.94 56.04 11.21 0.001* 0.287 0.23 59 
*D<0.05 


p= Clustering-corrected statistical significance 


SSEC i3 Validation Final Report of Confirmatory and Exploratory Analyses 27 


References 


Schochet, P. Z. (2008). Technical Methods Report: Guidelines for Multiple Testing in Impact 
Evaluations (NCEE 2008-4018). Washington, DC: National Center for Education Evaluation 
and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. 
Retrieved from http://ies.ed.gov/ncee/pdf/20084018.pdf 


Wasserstein, R.L. & Lazar, N.A. (2016). The ASA's Statement on p-Values: Context, Process, and Purpose. 
The American Statistician, 70:2, 129-133. doi: 10.1080/00031305.2016.1154108 


What Works Clearinghouse (2014). Procedures and standards handbook (Version 3.0). Washington, DC: 
Author. Retrieved from ies.ed.gov/ncee/wwc/pdf/reference_resources/ 
wwc_procedures v3 0 standards _handbook.pdf 


What Works Clearinghouse (2016). Reviewer Guidance for Use with the Procedures and Standards 


Handbook (version 3.0). Washington, DC: Author. Retrieved from 
http://ies.ed.gov/ncee/wwc/pdf/reference_resources/wwc_reviewer_guidance_030416.pdf 


SSEC i3 Validation Final Report of Confirmatory and Exploratory Analyses 28 


Appendices 


SSEC i3 Validation Final Report of Confirmatory and Exploratory Analyses 29 


Table A-1: Clustering Correction for Mismatched Analyses 
Due to the fact that the random assignment was carried out at the cluster level (i.e., school level), but the analyses were conducted at the 
student level, a clustering correction was applied to the p-values for statistically significant outcomes of the ANCOVA analyses to calculate 
clustering-corrected statistical significance levels (p-values). 


Treatment Control 
(Phase 1) (Phase 2) 
Subtest Group n Uncorrected p df Corrected p 
Elementary Cohort 
PASS OE All 1,409 1,176 0.09 0.012 94 0.10 2.278617 | 1.1744776 | 2014.564 0.240 
PASS OE ELL 370 247 0.20 0.010 83 0.07 | 2.434091) 2.0241855 | 596.9861 0.043* 
PASS OE FRL 890 659 0.12 0.012 87 0.08 | 2.335034) 1.5001646 | 1384.108 0.133 
PASS PT All 1,429 1,172 0.09 0.012 94 0.19 2.283769 | 0.9253471 | 1332.025 0.354 
PASS PT IEP? 132 94 0.39 0.002 72 0.17 2.889757 | 2.4783267 | 211.5658 0.013* 
PASS PT ELL 371 238 0.30 <0.001 83 0.32 | 3.612335 |} 2.0716161 |} 369.8459 0.038* 
PASS PT FRL 895 654 0.14 0.004 87 0.16 2.721465 | 1.4145159 | 1086.117 0.157 
Middle School Cohort 
PASS MC JEP" 111 114 -0.28 0.016 16 | 0.12 | -2.09981 | -1.2978464 | 189.7401 0.195 
PASS PT All 772 636 0.12 <0.001 22 0.17 2.240872 | 0.6498379 | 518.5035 0.516 
PASS PT FRL* 465 338 0.27 <0.001 22 0.18 3.777381 | 1.3750417 | 381.8819 0.169 
PASS PT | Female? 405 328 0.23 0.001 22 0.23 3.09628 | 1.0658156 | 285.8985 0.287 


1 Subgroup did not meet the WWC cluster-level attrition standard. All subgroups demonstrated baseline equivalence. 
* p < 0.05 after clustering correction 
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Table A-2: Elementary School Cohort Cluster-Level Attrition 


ELL 


Females 


Cluster Category 


Total 


Phase 
1 


MC 


Phase 2 


Dropped 


Attrition Rate 


Differential Attrition 


Final Count 
Total 


Dropped 


Attrition Rate 


Differential Attrition 


Final Count 

Total 52 48 100 35 33 68 35 33 68 
Dropped 7 10 17 3 5 8 3 4 7 
Attrition Rate 13.5% 20.8% 17.0% 8.6% 15.2% 11.8% 8.6% 12.1% 10.3% 


Differential Attrition 


Final Count 
Total 


Dropped 


Attrition Rate 


Differential Attrition 


Final Count 
Total 


Dropped 


Attrition Rate 


Differential Attrition 


Final Count 


Note: Only the IEP subgroup did not meet the WWC cluster-level attrition standard. 
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Table A-3: Middle School Cohort Cluster-Level Attrition 


ELL 


Females 


MC 
Cluster Category Phase 
Phase 2 Overall 
Total 
Dropped 
Attrition Rate 
Differential Attrition 
Final Count 
Total 
Dropped 
Attrition Rate 16.7% 50.0% 16.7% 
Differential Attrition 33.3% 16.7% 
Final Count 10 7 7 
Total 12 17 29 7 8 15 7 8 15 
Dropped 2 6 8 0 1 1 0 1 1 
Attrition Rate 16.7% 35.3% 27.6% 0.0% 12.5% 6.7% 0.0% 12.5% 6.7% 
Differential Attrition 18.6% 12.5% 12.5% 
Final Count 10 11 21 7 14 7 14 


Total 


13 


Dropped 


Attrition Rate 


15.4% 


Differential Attrition 


Final Count 
Total 


13 


Dropped 


2 


Attrition Rate 


15.4% 


Differential Attrition 


Final Count 


11 


Note: Neither the Overall sample or any subgroup met the WWC cluster-level attrition standard. 
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Table A-4: What Works Clearinghouse (WWC) Allowable Overall and Differential Attrition Rates 


Differential Attrition Differential Attrition Differential Attrition 
Overall Conservative Liberal Overall Conservative Liberal Overall Conservative Liberal 
Attrition Boundary Boundary Attrition Boundary Boundary Attrition Boundary Boundary 

0 57 10.0 22 52 97 44 20 5.1 
1 58 10.1 23 5.1 95 45 18 49 
2 59 10.2 24 49 94 46 1.6 46 
3 59 10.3 25 48 92 47 1.5 44 
4 6.0 10.4 26 47 90 48 13 42 
5 6.1 10.5 27 45 88 49 1.2 3.9 
6 62 10.7 28 44 86 50 1.0 37 
7 63 10.8 23 43 84 51 09 3.5 
8 63 10.9 x 41 82 52 07 3.2 
9 63 10.9 31 40 8.0 53 06 3.0 
10 63 10.9 32 38 78 54 04 28 
11 62 10.9 33 3.6 76 55 0.3 26 
12 62 10.9 4 35 74 56 0.2 23 
13 6.1 10.8 35 3.3 72 57 0.0 2.1 
14 6.0 10.8 % 3.2 70 58 - 19 
15 59 10.7 37 3.1 67 59 - 1.6 
16 59 10.6 38 29 65 60 - 1.4 
17 58 10.5 39 28 63 61 - 11 
18 57 10.3 40 26 60 62 - 0.9 
19 55 10.2 4 25 58 63 - 0.7 
20 54 10.0 42 23 56 64 - 0.5 
21 53 99 4B 21 53 65 - 0.3 


Note: Reproduces Table IIl.1 in the WWC Procedures and Standards Handbook (Version 3.0): Highest 
Differential Attrition for a Sample to Maintain Low Attrition, by Overall Attrition, Under Liberal and 
Conservative Assumptions 
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Figure A-1: Letter from What Works Clearinghouse (WWC) Confirming Appropriateness of Reviewing the 
Report as Two Separate Studies 


See the following pages. 
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Se 
What Works Clearinghouse W/WIC 


A central and trusted source of scientific evidence for what works in education. 


June 21, 2016 


Dr. Carol O’Donnell 

Smithsonian Science Education Center 
Smithsonian Institution 

901 D Street NW, Suite 704-B, MRC 952 
Washington, DC 200024 
odonnellc@si.edu 


Dear Dr. O’Donnell, 


Thank you for your email concerning the What Works Clearinghouse (WWC) review of M. Alberg’s 
report entitled, “The LASER Model: A systemic and sustainable approach for achieving high 
standards in science education” (Released July 2015). In response to your inquiry, we conducted an 
in-depth, independent quality review to address the issues you raised. 


The WWC quality review team responds to concerns raised about WWC reviews published on our 
website. When a quality review is conducted, a senior WWC researcher who was not involved in the 
initial review undertakes an independent assessment of the study in question. The senior reviewer 
also investigates the procedures used and decisions made during the original review of the study. 
These quality reviews are one of the tools used to ensure that the standards established by the 
Institute of Education Sciences are upheld on every review conducted by the WWC. 


In your email dated June 4, 2016, you stated that the Alberg (2015) report should have been 
reviewed as two separate studies: one for the elementary school cohort and a second for the 
middle school cohort. You stated that these two cohorts were described and analyzed separately in 
the original report, and thus should be reviewed as two separate studies. 


We investigated this issue in the quality review, revisited our review of the report, and came to the 
following conclusion: it is most appropriate to review this report as two separate studies. 


The WWC Procedures and Standards Handbook (version 3.0) defines a study as: 


“..the examination of the effect of an intervention on a particular sample (e.g., a set 
of students, schools, or districts) and set of outcomes. To be a separate study, the 
sampling errors must be independent. For randomized controlled trials, a study is 
defined by randomization. This definition excludes subgroups from being their own 
studies because they were randomized at the same time as the full sample and 
treats additional cohorts without rerandomization of the unit of assignment as a 


whatworks.ed.gov ¢ PO Box 2393, Princeton, NJ 08543-2393 ¢ 1-866-503-6114 


aa 
What Works Clearinghouse W/WIC 


A central and trusted source of scientific evidence for what works in education. 


single study; however, if the same units were rerandomized to condition, then they 
are separate studies.” (p. 7) 


The original WWC review of the Alberg report was completed in September of 2015. In March of 
2016, the WWC published an additional document providing Reviewer Guidance for Use with the 
Procedures and Standards Handbook (available online at 


http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=260). This new Guidance document further 


clarifies that: 


“..if a single research team reports pooled impacts combining distinct randomized 
samples, that will be considered one study. The reviewer should classify a sample 
(and corresponding analysis) as one study if the intervention and comparison groups 
were formed in either one randomization process or multiple processes reported as 
pooled findings by one team. The reviewer should classify a sample (and 
corresponding analysis) as n studies if the intervention and comparison groups were 
formed inn randomization processes and not combined...” (p. 4) 


In the reviewed study, the research team used a matched-pair randomization design where 
elementary schools were paired and randomized to conditions, and, through a separate process, 
middle schools were paired and randomized to conditions. Because the intervention and 
comparison groups were formed via different randomization processes for the elementary and 
middle schools, and given that the report presented findings separately for all elementary and 
middle school cohorts, the independent quality review confirmed that this report should be 
reviewed as two separate studies. 


The WWC updates the reviews of all studies when the results of a QRT indicate a revision is 
necessary. As a result of this independent quality review, the WWC will update the review of the 
Alberg (2015) report to reflect that two studies are contained within the report. 


| hope that this letter has addressed your concerns. If you have other concerns, please do not 
hesitate to contact the WWC through info@whatworks.ed.gov. 


Sincerely, 


a, Y_. 
We 


& ee aes a 
ks uk, (ANteEsrilh 


Emily E. Tanner-Smith, Ph.D. Joshua R. Polanin, Ph.D. 
What Works Clearinghouse at Development Services Group, Inc. 


whatworks.ed.gov ¢ PO Box 2393, Princeton, NJ 08543-2393 ¢ 1-866-503-6114 


Figure A-2: What Works Clearinghouse (WWC) Email Chain Regarding Cluster-Level Inference 


See the following pages. 
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Todd Alan Zoblotsky (tzbltsky) 


From: WhatWorks <What.Works@icfi.com> 

Sent: Tuesday, August 23, 2016 3:56 PM 

To: Todd Alan Zoblotsky (tzbltsky) 

Subject: RE: What Works Clearinghouse (WWC 6154 6157) 


Dr. Zoblotsky, 


Yes, that is correct. A cluster only contributes to cluster-level attrition if it met the requirements to be included in the 
subsample at the time of randomization. 


Thank you, 
What Works Clearinghouse 


From: Todd Alan Zoblotsky (tzbltsky) [mailto:tzbltsky@memphis.edu] 
Sent: Tuesday, August 23, 2016 11:10 AM 

To: WhatWorks 

Subject: RE: What Works Clearinghouse (WWC 6154 6157) 


Thank you very much for your response. Just to clarify, in the case of attrition for a specific subgroup (e.g., 
ELL) at the cluster level, our starting point for attrition would only be clusters that had students in that 
particular subgroup, correct? In other words, we do not start with the full sample of clusters, and if a school 
does not have any students in a particular subgroup (e.g., ELL), count that school against our attrition for that 
subgroup, correct? My assumption is that for cluster-level attrition, the school is counted against our attrition 
only if all students in the particular subgroup (e.g., ELL) who had baseline data did not have outcome data. Am 
| understanding this correctly? 


Thanks, 


Todd 


Todd A Zoblotsky, Ed.D. 
Research Associate Professor 
Center for Research in Educational Policy 


From: WhatWorks [mailto:What.Works @icfi.com] 

Sent: Tuesday, August 23, 2016 10:35 AM 

To: Todd Alan Zoblotsky (tzbltsky) <tzbltsky@memphis.edu> 
Subject: RE: What Works Clearinghouse (WWC 6154 6157) 


Dear Dr. Zoblotsky, 


Cluster-level attrition is assessed separately for each subgroup. However, if students in a subgroup are represented in all 
clusters at randomization and follow-up, attrition within the subgroup would be identical to attrition in the full sample. 


Thank you, 
What Works Clearinghouse 


From: Todd Alan Zoblotsky (tzbltsky) [mailto:tzbltsky@memphis.edu] 
Sent: Tuesday, August 23, 2016 6:54 AM 

To: WhatWorks 

Subject: RE: What Works Clearinghouse (WWC 6154 6157) 


Thank you very much for your response. 


If | could ask a follow up question: As the Reviewer Guidance documents states (p. 5): “If the study is making 
cluster-level inferences, the study can Meet WWC Group Design Standards without Reservations. The impact 
estimates reflect the combination of the effect on individuals who were in the clusters at the time of 
assignment and the effect on the composition of individuals within the clusters over time, which is noted in 
our reporting. i 


Does this mean that attrition at the cluster level is only assessed for the overall sample (all students 
combined), or must the cluster level attrition also be assessed for each subgroup analysis reported (e.g., ELL, 
Students with a Disability)? 


Thank You, 


Todd 


Todd A Zoblotsky, Ed.D. 
Research Associate Professor 
Center for Research in Educational Policy 


From: WhatWorks [mailto:What.Works @icfi.com] 

Sent: Monday, August 22, 2016 5:20 PM 

To: Todd Alan Zoblotsky (tzbltsky) <tzbltsky@memphis.edu> 
Subject: RE: What Works Clearinghouse (WWC 6154 6157) 


Dear Dr. Zoblotsky, 


The unit of analysis does not affect how the WWC treats joiners in reviews of cluster-level randomized control trials 
(RCTs). As clarified in the WWC guidance for the standards, the key issue is whether the study authors make claims 
about the effect of the intervention on outcomes for subcluster units, or only on cluster-level outcomes. Therefore, it is 
possible for an RCT with cluster-level assignment and that uses language consistent with cluster-level inferences to Meet 
WWC Group Design Standards Without Reservations regardless of whether the analysis was conducted using subcluster- 
or cluster-level units. 


For example, if the authors of an RCT with school-level assignment make claims about the effect of the intervention on 
students, then the presence of students in the analytic sample who were not in schools at the time of random 
assignment jeopardizes the random assignment. However, if the authors only make claims about the effect of the 
intervention on schools then joiners are not considered a threat to inference. By definition, the effect of the intervention 
on school-level outcomes includes the possibility that the intervention affected the composition of the students within 
the schools. The unit of analysis has no relevance to these issues. 


2 


The unit of inference can affect the rating in studies that must demonstrate equivalence. In high-attrition cluster RCTs 
which make student-level inferences, baseline equivalence and statistical adjustment (if needed) must be performed 
using subcluster-level data. If the study makes cluster-level inferences, these could be done using either subcluster- or 
cluster-level data. 


Finally, you also mentioned the relationship between the unit of analysis and statistical power. Please note that the 
WWC will apply a post hoc clustering correction to determine the statistical significance of findings from a subcluster- 
level analysis that did not appropriately adjust for clustering. 


We hope this information is helpful. 


Thank you, 
What Works Clearinghouse 


From: Todd Alan Zoblotsky (tzbltsky) [mailto:tzbltsky@memphis.edu] 
Sent: Friday, August 19, 2016 3:23 PM 

To: WhatWorks 

Subject: RE: What Works Clearinghouse (WWC 6154 6157) 


If | could add one more thing to my second question about cluster level analysis: 


Reading the reviewer guidance document (p. 5), which | think would be the “final word” on this issue (?) they 
are clearly talking about how the analyses are discussed vs. analyzed: 


“For an analysis of stayers and joiners, the rating depends on the type of inference described in the study, 


. To understand the inferences the authors are making, read how they 
improvement in school scores, then it is a cluster-level inference. 
increases in student scores, then it is an individual-level inference.” 


If they are 


So this leads me to believe that for cluster level analysis/inference, it is how the analyses are discussed vs. 
conducted (i.e., it is acceptable to have individual level analysis with cluster level inference). 


Thank You, 


Todd 


Todd A Zoblotsky, Ed.D. 
Research Associate Professor 
Center for Research in Educational Policy 


From: WhatWorks [mailto:What.Works@icfi.com] 
Sent: Friday, August 19, 2016 2:31 PM 


To: Todd Alan Zoblotsky (tzbltsky) <tzbltsky@memphis.edu> 
Subject: What Works Clearinghouse (WWC 6154 6157) 


Hello, 


Thank you for contacting the What Works Clearinghouse (WWC). We have received your two emails below. WWC staff 
are reviewing your request and we get back to you soon. 


If you would like to be notified of new publications and updates to the WWC website, you can sign up for the News Flash 
email updates from the Institute of Education Sciences (IES). Please visit the link 
http://ies.ed.gov/newsflash/?url=%2Fncee%2Fwwc%2Findex%2Easp&site=What+Works+Clearinghouse, choose the 
National Center for Education Evaluation (NCEE) option (the second one), and select What Works Clearinghouse. 


Thank you, 


What Works Clearinghouse 


The What Works Clearinghouse was established by the U.S. Department of Education’s Institute of Education Sciences to provide educators, policymakers, 
researchers, and the public with a central and trusted source of scientific evidence of what works in education. For more information, please visit 
http://ies.ed.gov/ncee/wwc/. 


From: t.zoblotsky@memphis.edu [mailto:t.zoblotsky@memphis.edu] 

Sent: Friday, August 19, 2016 1:39 PM 

To: info@whatworks.ed.gov 

Subject: IES WWC Website: Contact Us: Procedures and Standards Handbook, Reference ID Number: 1280259576 


info@whatworks.ed.gov, this email was automatically sent through the Contact link on the WWC website. 
From: t.zoblotsky@memphis.edu 


Message: | need clarification on the following statement on page 10 of the current WWC handbook: 
&quot;Furthermore, a study with cluster-level assignment and cluster-level analysis may have changes in subcluster 
composition that are not subject to the attrition standard. A cluster-level analysis of stayers and joiners used to answer a 
cluster-level research question may Meet WWC Group Design Standards without Reservations. If the analysis is 
conducted at the individual level, any nonrandom movement or placement of individuals into the intervention or 
comparison groups after random assignment jeopardizes the random assignment design of the study. Individual-level 
studies of stayers or stayers plus joiners may Meet WWC Group Design Standards with Reservations if the study is able 
to demonstrate baseline equivalence of the analytic sample.&quot; Does this literally mean that for a cluster-level 
analysis that includes both stayers and joiner to Meet WWC Group Design Standards without Reservations, you must 
aggregate individual (student) data up to the cluster (school) level and do the analysis of school level means? This seems 
questionable due to the loss of power in moving from the student to the cluster level. In addition, how do control for 
covariates such as Special Education status or ELL status when you aggregate data from the individual to the cluster 
(school level)? Is it possible to have a study with cluster level assignment and cluster level research questions and 
inference with analysis at the individual level Meet WWC Group Design Standards without Reservations? 


From: t.zoblotsky@memphis.edu [mailto:t.zoblotsky@memphis.edu] 

Sent: Thursday, August 18, 2016 4:22 PM 

To: info@whatworks.ed.gov 

Subject: IES WWC Website: Contact Us: Procedures and Standards Handbook, Reference ID Number: 1250124092 


info@whatworks.ed.gov, this email was automatically sent through the Contact link on the WWC website. 

From: t.zoblotsky@memphis.edu 

Message: | have a question about how to apply the Benjamini-Hochberg correction for multiple comparisons in my 
situation. | have one outcome domain with three measures. For each of the three outcomes, | ran 9 separate statistical 


tests (All students combined, and 8 subgroups). Each analysis was run separately, so | have a total Of 27 statistical tests 
4 


that were done (3 outcomes x 9 groups). My question is: Should the total number of outcomes (M) | use for the 
correction be 27 (based on running 27 individual statistical tests) or should it be 3 (based on having 3 outcome 
measures)? 


